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Abstract. Tree alignment graphs (TAGs) provide an intuitive data 
structure for storing phylogenetic trees that exhibits the relationships 
of the individual input trees and can potentially account for nested tax¬ 
onomic relationships. This paper provides a theoretical foundation for 
the use of TAGs in phylogenetics. We provide a formal definition of 
TAG that — unlike previous definition — does not depend on the order 
in which input trees are provided. In the consensus case, when all input 
trees have the same leaf labels, we describe algorithms for constructing 
majority-rule and strict consensus trees using the TAG. When the input 
trees do not have identical sets of leaf labels, we describe how to deter¬ 
mine if the input trees are compatible and, if they are compatible, to 
construct a supertree that contains the input trees. 

1 Introduction 

Phylogenetic trees are graphs depicting the evolutionary relationships among 
species; thus, they are powerful tools for examining fundamental biological ques¬ 
tions and understanding biodiversity (e.g., 0 )- The wealth of available genetic 
sequences has rapidly increased the number of phylogenetic studies from across 
the tree of life (e.g., m)- For example, STBase contains a million species trees 
generated from sequence data in GenBank |14] . New next-generation sequenc¬ 
ing technologies and sequence capture methods (e.g., |10|l2 | l3j ) will further 
increase the rate in which phylogenetic data is generated in the coming years. 
This continuous flow of new phylogenetic data necessitates new approaches to 
store, evaluate, and synthesize existing phylogenetic trees. 

Recently Smith et al. introduced tree alignment graphs (TAGs) as a way to 
analyze large collections of phylogenetic trees [19] . TAGs preserve the structure 
of the input trees and thus provide an intuitive, interpretable representation 
of the input trees, which enables users to visually assess patterns of agreement 
and conflict. The TAG structure also makes it possible to combine trees whose 
tips include nested taxa (e.g., the tips of one tree contain species in taxonomic 
families, while the tips of another tree contain the families), which was true 
of only a few previous synthesis approaches mm- Indeed, a TAG was used 
to merge a taxonomy of all ~2.3 million named species with ~500 published 
phylogenetic trees to obtain an estimate of the tree of life [5D] . 
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The original TAG definition of Smith et al. m depends on the order of the 
input trees, which can be problematic. Further, the several details of the syn¬ 
thesis process were not specified. Our aim in this paper is to lay the theoretical 
foundations for further research on TAGs. To this end, we first provide a math¬ 
ematically precise definition of TAGs which is independent of the order of the 
input trees (Section [5]), and develop an algorithm for constructing TAGs (Sec¬ 
tion El). We also describe algorithms to build strict and majority-rule consensus 
trees using TAGs (Section 0]). We show how to check the compatibility among 
input trees and construct a supertree from compatible phylogenetic trees using a 
TAG (Sectional). Finally, we discuss the future applications and problems asso¬ 
ciated with using TAGs for assessing and synthesizing the enormous and rapidly 
growing number of available phylogenetic trees in the future (Section [B]). 

Related work. TAGs are part of a long history of using graph structures to syn¬ 
thesize the relationships among phylogenetic trees with partial taxonomic over¬ 
lap. The classic example is the Build algorithm mu and its later variations 
(e.g., |5|6|8|15|). These methods yield polynomial-time algorithms to determine 
whether a collection of input trees is compatible, and, if so, output the parent 
tree(s). Other graph-based algorithms, such as the MinCutSupertree [18], 
the Modified MinCutSupertree m , or the MultiLevelSupertree |3] al¬ 
gorithm allow users to synthesize collections of conflicting phylogenetic trees. 
Although TAGs share important features with these earlier graph-theoretic ap¬ 
proaches, TAGs display more directly the phylogenetic relationships exhibited 
by the input trees and therefore provide a more intuitive framework to examine 
patterns of conflict among trees m- TAGs also potentially summarize the infor¬ 
mation in the input trees with fewer nodes than previous graphs for semi-labeled 
trees [518] . 

2 Preliminaries 

2.1 Notation 

Let T be a rooted tree. Then, rt(T) and C(T) denote, respectively, the root and 
the leaf set of T, and V(T) and E{T) denote, respectively, the set of vertices and 
the set of edges of T. The set of all internal vertices of T is I{T) := V{T)\C{T). 
We define <t to be the partial order on V(T) where x <t y if y is a vertex on 
the path from rt(T) to x. If {x,y} G E(T) and x <t y, then y is the parent of 
X and a; is a child of y. Two vertices in T are siblings if they share a parent. 

Let A be a finite set of labels. A phylogenetic tree on A is a pair E = (T, ip) 
where 1) T is a rooted tree in which every internal vertex has degree at least 
three, except rt(r), which has degree at least two, and 2) is a bijection from 
£(T) to A [H]. Tree T is called the underlying tree of E and p is called the 
labeling map of E. For convenience, we will often assume that the set of labels 
A of T is simply C{E). The size of E is the cardinality of ClT). E is binary (or 
fully resolved) if every vertex v G I{T) \ rt(T) has degree three and rt(r) has 
degree two. 
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Let T = (r, (/?) be a phylogenetic tree on X and let v be any vertex in V (T). 
The subtree of T rooted at vertex v S V{T), denoted by T„, is the tree induced 
by {u G V{T) : u < v}. The duster at v, denoted C-r{v), is the set of leaf labels 
G X : u € C{Ty)}. We write 'H(T) to denote the set of all clusters of T. 
Note that 'H(T) includes trivial clusters; i.e., clusters of size one or |X|. 


2.2 Tree Alignment Graphs 

Here we define the tree alignment graph (TAG). Our definition is somewhat 
different from that of Smith et al. m- We explain these differences later. 

We first need an auxiliary notion. A directed multi-graph is a directed graph 
that is allowed to have multiple edges between the same two vertices. More 
formally, a directed multi-graph is a pair {V, E) of disjoint sets (of vertices and 
edges) together with two maps init : E ^ V and ter : E ^ V assigning to each 
edge e an initial vertex init(e) and a terminal vertex ter(e) [9]. Edge e is said 
to be directed from init(e) to ter(e). The in-degree of a node v is the number of 
edges e such that ter(e) = v] the out-degree of v is the number of edges e such 
that init(e) = v. We call a node with out-degree zero a leaf node; all non-leaf 
nodes are called internal nodes. 

A directed acyclic (multi-) graph, DAG for short, is a directed multi-graph 
with no cycles. 

Definition 1 (Tree Alignment Graph (TAG)). Let V he a collection of 
phylogenetic trees and S = C^T). The tree alignment graph of V is a 

directed graph D = {U,E) along with an injective function f : U ^ 2‘®, called 
the vertex-labeling function, such that 

— for each v GU , f{v) G 'H(T) for some T GV, and 

— for each T := {T,ip) G V and each e := {x,y} G E{T) where x <t y, there 
exists a unique e' G E such that Cq-{x) = /(ter(e')) and Cq-[y) = /(init(e')). 

Figure [T] illustrates Definition [T] 

Remarks: 

1. Note that we only use the vertex-labeling function / to facilitate the defini¬ 
tion and to label the leaf nodes of the TAG. We do not actually label the 
internal vertices of the TAG, since, as the TAG gets bigger, assigning labels 
using / becomes impractical. 

2. Having a unique edge in the TAG for each input tree edge enables system¬ 
atically annotating the TAG for each individual input tree, as the TAG also 
provides a means for storing phylogenetic trees. 

Lemma 1. The TAG is acyclic. 

Proof. Stems from the fact that for each edge e G E, /(ter(e)) C /(init(e)). □ 
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Fig. 1. A collection V of phylogenetic trees 7i, 72, and Ti and the TAG D of V. The 
edges of D are colored so as to correspond to the input trees; the internal vertices are 
labeled for clarity. 


Comparison with Smith et al.’s TAG. In m, Smith et al. define their TAG 
procedurally, as follows. Let V he a collection of phylogenetic trees and S = 
UreT Let D = (C7, E) be a directed graph along with an injective vertex¬ 
labeling function f : U ^ 2^. Initially, D has a vertex and [S'! vertices 

for each s G S. Next, Smith et al.’s method process each input tree 
T = (T, (p) G V, in some order, and does the following: 

1. Map each vertex v G J0.{T) to vertex u G U where p{v) = f{u). 

2. Map each vertex v G I{T) to the vertex u G U, where C-r{v) fl f{u) ^ </>, 
£(T) \ Cr{v) n f{u) = 4>, and Cr{v) CS\ f{u) = (j)- If no such u exists, then 
add new vertex u with f{u) := C'j-{v) in D. 

3. In the case of a vertex v G I(T) mapping to multiple vertices in 77, for each 

such t vertices ui,...,ut G U, where for each j G {l,...,t — 1} there exists 
e' G E such that ter(e') = Uj and init(e') = uj+i, discard all mapping of v 
to U 2 ,...,ut, except ui. Note that v G V(T) can still be mapped to multiple 
vertices in D. For example, vertex v of T 3 in Fig. [T] is mapped to vertices 
/“^({a, &, c}) and of Di in Fig. H 

4. For edge e = {x, y} G E{T), add directed edges in D from all mappings of x 
to all mappings of y. 

Observe that Smith et al.’s definition of the TAG coincides with Definition 
[T] when the input trees have completely overlapping leaf label sets; however, 
it differs when the input trees have partially overlapping leaf label sets, as we 
discuss next. 

Notice that Step [2] first tries to map a vertex u G /(T) to a vertex u G C/ for 
which Cr{v) C /(u), and f{u) does not have any label of C{T) other than that 
of C'r{v). If no suitable match exists, then a new vertex f~^{C'r{v)) is added to 
D. As a result, the set of vertices in the TAG that this procedure creates can 
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depend on the order of input trees. Fig. [5] illustrates how changing the order 
of input trees can lead to different TAGs for the input trees of Fig. [TJ When 
Ta is processed after 7i and 72, the vertex v oi Tz maps to f~^{{a,b,c}) and 
f~^{{a,b,d}) in Step [2l On the other hand, processing Ta before 7i and 72, 
necessitates the creation of b}) in the resulting TAG. 

Smith et al. discussed the possibility of order dependence of their TAG and 
addressed it through a post-processing procedure m- For the given collection 
of input trees and the TAG that results from the first round of processing, the 
post-processing procedure recomputes the mapping of each internal vertex of the 
input tree following Step 2. If the new mapping of an internal vertex of the input 
tree differs from the old mapping, then the mapping is updated. Edges of the 
resulting TAG that correspond to the outdated mapping are removed and edges 
for the new mapping are added. For example, there will be no change in TAG 
Di after applying post-processing procedure. On the contrary, post-processing 
will map V G /(Ta) to f~^{{a,b,c}) and f~^{{a,b,d}) in D2. This new mapping 
will cause adding directed edges, 1) from /“^({a, b, c, d, e}) to /“^({a, &, c}) and 
/“^({a, 6, d}), 2) from /“^({a, 6, c}) to /“^({a}) and f~^i{b}), and 3) from 
f~^{{a,b,d}) to /”^({a}) and f~^{{b}) in £> 2 . Let be the resulting TAG 
after applying post-processing on D 2 . Clearly, Di and are different. We note 
that since the post-processing is inadequate for overcoming order-dependence, 
an algorithm for pre-processing of input trees is in developmenll^. In contrast to 
Smith et al.’s TAG [TH], our TAG (in Definition I) is independent of the order 
of input trees. 




Fig. 2. Following [IS], if the input trees from Fig. [T]are processed in the order 7i, 72, 
and Ti, the resulting TAG is Di; changing the order of input trees, and processing 
them as Ti, 71, and 72, results into D 2 , which is different from Di. Again, the edges 
of both TAGs are colored so to correspond to the input trees; the internal vertices are 
labeled for clarity. 


The input trees in m always include a taxonomy tree, which contains all 
of the leaf labels from the input trees and could be a star tree (i.e., all the leaf 

® S. A. Smith, J. W. Brown, and C. E. Hinchliff (Department of Ecology and Evolu¬ 
tionary Biology, University of Michigan, Ann Arbor), personal communication. 
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nodes connecting to a root node) if no taxonomic classification is available. Here 
we study the TAG in the standard supertree framework, so we do not assume 
that a taxonomy tree is included. 


3 Constructing a TAG 

We now present an algorithm for constructing the TAG for a collection of phylo¬ 
genetic trees. The algorithm first collects clusters by reading through the input 
trees and making each unique cluster a node in the TAG. It then adds edges 
between nodes in the TAG. 

Let V be the input collection of k phylogenetic trees. Let S — Utg 75 
and n = [S'!. 


3.1 Building TAG Nodes 

We define a bijection g that maps each taxon in S' to a unique number in 
{1,2,... ,n}. For each tree T = iT,ip) in V, the bit-string of u G V{T) is a 
binary string of length n, where the zth bit is 1, if g~^{i) G C'r{v), or 0, other¬ 
wise. We collect bit-strings from the input trees in a list A and construct a TAG 
node for each unique bit-string. 

Collecting bit-strings: The algorithm starts by traversing each input tree in post¬ 
order. When, after traversing its subtree, a vertex v is visited, we compute v’s 
bit-string as follows. If u is a leaf with label s G S, the bit string for v is simply 
the string of length n with a 1 at position g{s) and Os everywhere else. If v is 
an internal node, its bit-string is the OR of the bit-strings of v’s children. After 
each bit-string is computed, it is stored in A. When the traversals of all k input 
trees are complete, A has 0{nk) bit-strings. 

Filtering unique bit-strings: To remove duplicates from A, we first sort it using 
radix sort [7l Chapter 8]. Given N 6-bit numbers and any positive integer r < b, 
radix sort sorts these numbers in 0{{b/r){N -|- 2’')) time. In our case, b = n and 
N = nk, giving a running time of 0{(n/r){nk-\-2’~)). This quantity is minimized 
when r = log(nA:), giving a running time of 0(ri^k/log{nk)). 

After sorting A, we remove its duplicate bit-strings in a single linear scan. 
This can be done in 0{n^k) time through standard methods. 

TAG nodes: We construct a node in the TAG for each unique bit-string in A. 
For bit-strings corresponding to the leaf nodes, we also associate the appropriate 
label from S using function g. 

3.2 Adding Edges to the TAG 

Once the vertices of input trees have been mapped to the vertices of TAG, we 
add directed edges to the TAG. We traverse each input tree in post-order. When 
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a tree traversal visits an internal vertex v of an input tree, we find the bit-strings 
of V and u’s children in A and locate the nodes corresponding to them in the 
TAG. We then add directed edges from the TAG node corresponding to v to the 
TAG nodes for v’s children. 

Theorem 1. For a given collection of k phylogenetic trees on n labels, the TAG 
can be built in 0{n?k) time. 

Proof. Collecting bit-strings and then sorting them requires Oinfk) time. The 
remaining steps take time linear in the size of the input. □ 


4 Finding Consensns Trees using the TAG 

Let 7^ be a collection of k input phylogenetic trees with completely overlapping 
leaf label set of size n. The strict consensus tree for V is the tree whose clusters 
are precisely those that appear in all the trees in V. The majority-rule consensus 
tree for V is the tree whose clusters are precisely those that appear in more than 
half (i.e., the majority) of the trees in V. Here we show how to build the majority- 
rule consensus trees for V from the TAG for V. We then outline the modifications 
needed to compute the strict consensus tree. 

Algorithm MajorityRuleTree (Algorithm [1} builds the majority-rule tree 
for V by traversing the TAG D for V. Let D = {U, E) and / be the vertex¬ 
labeling function. We assume that each vertex v in D stores the cardinality of v 
— i.e., the number of taxa in f(v) — along with count(u), the number of times 
cluster f{v) appears in a tree in V. We also assume that multiple edges between 
the same two vertices are replaced by a single edge. The next observation follows 
from the fact that the input trees have completely overlapping leaf label sets. 

Observation 1 D has precisely one vertex s with in-degree zero. 

Let u be a node in D. Then, u is a majority node if count(u) > k/2. The 
clusters associated with majority nodes are precisely the clusters of the majority- 
rule tree. Let the nodes of the majority-rule tree correspond to the majority 
nodes of D. Next we develop an approach to hook up these nodes to actually 
build the majority-rule tree. 

Let u and v be nodes of D. Then, u is a majority ancestor of u if u is a 
majority node, and there is directed path from u to u in D. Algorithm [1] is based 
on the following observation (parts of which were noted in [2] ). 

Observation 2 Let u and v be majority nodes in D. Then, 

(i) if there is a directed path from a majority node u to a majority node v in 
D, then f{v) C f{u), and 

(a) if V is the parent of u in the majority-rule tree for V, then v is the (unique) 
minimum-cardinality majority ancestor of u; further, f{u) C f{v). 
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Let u be a node in D. The most recent majority ancestor of u is the unique 
minimum-cardinality majority ancestor u. For each vertex u £ U, our algorithm 
maintains two variables: p{u), a reference to the smallest cardinality majority 
ancestor of u seen thus far, and m{u), the cardinality of p{u). Initially, every 
node u, except the node s of in-degree zero, has p{u) = s, representing initial 
best estimate of the most recent majority ancestor of u. The algorithm revises 
this estimate repeatedly until it converges on the correct value. After this process 
is complete, it is now a simple matter to assemble the majority-rule tree, since, 
for each majority node u, p{u) points to u’s parent in that tree. 

Algorithm [T] processes the nodes of D according to topological order — this 
ordering exists because D is acyclic (from Lemma 1). When the algorithm visits 
a node u, it examines each successor v, and considers two possibilities. If u is 
a majority node, then u may become the new value of p{v), while if u is not a 
majority node, p{u) may become the new value of p{v). By Observation [21 the 
decision depends solely on node cardinalities. 
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Input: The TAG D = {U,E) for a collection V of trees over the same leaf set. 
Output: The majority-rule tree for V. 

Let s be the unique vertex in D with in-degree 0 
foreach u £V — s do 
I m{u) = n ; p{u) = s 
Perform a topological sort of D — s 
Let M C t/ be the set of majority nodes in D 
foreach u £ U — s in topological order do 
if u £ M then 
I P = l/(«)l ; tt = u 

else 

I p = m{u) ; TT = p{u) 
foreach v £ U such that (u, v) £ E do 
if m{v) > p then 
I m{v) = p ; p(v) = TT 

Let T be the tree with vertex set M, where, for every u £ M, the parent of u in 
T is p{u) 

Let if be the function that maps each leaf u of T to f{u) 
return (T, ip) 


Algorithm 1: MajorityRuleTree(H) 


Theorem 2. Given the TAG D for a collection V of k phylogenetic trees on the 
same n leaves, the majority-rule consensus tree of V can he computed in 0(nk) 
time. 

Proof (Sketch). Correctness can be proved using Observation |2l To bound the 
running time, note that topological sort takes time linear in the size of D, and 
the main loop (Lines I MT51) examines each node and each edge once. Since the 
number of edges and nodes in D is 0{nk), the claimed time bound follows. □ 
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The algorithm for strict consensus tree is similar to Algorithm [U with only 
one significant difference: instead of dealing with majority nodes, it focuses on 
strict nodes, that is, TAG nodes u such that count(M) = k. We omit the details, 
and simply summarize the result. 

Theorem 3. Given the TAG for a collection V of k phylogenetic trees on the 
same n leaves, the strict consensus tree of V can he computed in 0{nk) time. 

5 Testing Compatibility using the TAG 

Let T and T' be two phylogenetic trees on X and X', respectively, where X C 
X'. We say that T' displays T if, up to suppressing non-root nodes of degree 
two, the minimum rooted subtree of 'T' that connects the elements of X refines 
T, i.e., T can be obtained from it by contracting internal edges. Suppressing a 
node of degree two means replacing that node and its incident edges by an edge. 

Let V be the input collection of rooted phylogenetic trees. We say that V is 
compatible if there exists a phylogenetic tree T, called a compatible supertree for 
T, that simultaneously displays every tree in V. A classic result in phylogenetics 
is that compatibility can be tested in polynomial time mu- In this section, we 
show that compatibility can be tested directly from the TAG for V. 

We need some definitions. As before, we assume that multiple edges between 
the same two TAG vertices are replaced by a single edge. The extended TAG is 
the graph D* obtained from D by adding undirected edges between every two 
vertices u,v G U such that f{u) and f{v) are clusters corresponding to sibling 
vertices in some tree in V. D* is a mixed graph, i.e., a graph that contains both 
directed and undirected edges. See Fig. [31 



a 5 c 



h d 


Fig. 3. Two phylogenetic trees 7i and 75 with 
edges are shown with dashed lines. 



Let D' = (U', E') be a mixed graph. An arc component of D' is a maximal 
sub-mixed graph W of E' such that for every two nodes u and n in IF there 
is a path from u to v which consists only directed edges, irrespective of their 
directions. Let f be a node of D'. The mixed graph obtained by deleting v and 
its incident directed and undirected edges is denoted hy D'\v. Let fo be a subset 
of U'. We write D'\V to denote the (mixed) graph obtained from D' by deleting 
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Input: The extended TAG D* for a collection V of phylogenetic trees. 
Output: A phylogenetic tree T that displays each tree in V or the statement 
not compatible. 

Let So be the set of nodes in D* that have in-degree zero and no incident edges, 
if 5'o is empty then 
I return not compatible 

if So contains exactly one node with out-degree zero and label i then 
I return the tree composed of singleton node with label £. 

Find the node sets Si, S 2 , ■ ■ ■, Sm of the arc components of D* \ So- 
Delete all undirected edges of D* \ So whose endpoints are in distinct arc 
components. 

foreach i £ {1, 2,..., m} do 

Gall DESCENDANT(D*|Si) 
if this call returns not compatible then 
I return not compatible 
else 

I Let 7i be the phylogeny returned by this call 
return a tree with a root node and 7i, 72,..., Tm as its subtrees. 


Algorithm 2: DESCENDANT(i7*) 


each node in V from D'. The restriction of D' to V, denoted by D'\V, is the 
subgraph of D' obtained by deleting each node in C/' \ R from D'. 

The extended TAG is closely related to the restricted descendancy graph 
(RDG) [8l5j . The RDG has a unique node for each internal input tree node 
along with jSI leaf nodes. Let u and v be two input tree nodes, and let u' and 
v' be the corresponding nodes in the RDG. If u is a parent of v, then there is 
a directed edge from u' to v' in the RDG. If u and v are siblings, then there is 
an undirected edge between u' and v' in the RDG. Otherwise, there is no edge 
between u' and v'. 

The extended TAG can be viewed as a compact version of the RDG of V. 
Thus, a slight adaptation of the Descendant algorithm |8I5) enables us to de¬ 
termine whether V is compatible given its extended TAG D*. The details of this 
adaptation are shown in Algorithmic] The algorithm first attempts to decompose 
the problem into subproblems, each of which corresponds to one of the subtrees 
of the compatible supertree. If no such decomposition exists, then V is incom¬ 
patible. Otherwise, the algorithm identifies a collection of subproblems, each 
associated with a different arc component, and recursively tests compatibility 
for each subproblem. 

Theorem 4. Let D* be the extended TAG for a collection V of phylogenetic 
trees. IfV is compatible, then Descendant(D*) returns a compatible supertree 
for V; otherwise, DESCENDANT(i7*) returns the statement not compatible. 

Proof (Sketch). Let V’ be the collection of phylogenetic trees after labeling the 
internal nodes of the input trees in V by their clusters. The order of labels in a 
cluster does not matter, that is, we assume two labels identical if their respective 
clusters are identical sets. Now the extended TAG of V is the same as the RDG of 
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V' ■ The correctness of Algorithm [5] now follows from the proof of [5] Preposition 
4] for V'. We omit the details for lack of space. □ 

Running time: Following 0 Preposition 3]), we can show that if V consists of k 
fully resolved phylogenetic trees on the leaf set of size n, then the Descendant 
subroutine runs in time 0{n^k‘^). We conjecture that the running time can be 
reduced to 0{nk\o^ n) using the approach discussed in If, however, the 
input trees are not fully resolved, the running time increases by a factor of n. 

Remark. As we mention earlier, there are considerable similarities between the 
extended TAG and the RDG. Nevertheless, the former has some advantages in 
practice. While every internal node of a tree in V gives rise to a distinct node in 
the RDG, the extended TAG has one node for each unique cluster. For instance, 
in the extreme case when V contains k identical phylogenetic trees on S, the 
RDG has 0{nk) nodes, while the extended TAG contains only 0{n) nodes. 
More typically, the trees in V will share many clusters, and the likelihood of this 
being the case is especially high when k is much larger than n. 


6 Discussion 

We have presented a formal definition of the TAG that does not depend on the 
order of the input trees. We have also presented a procedure for building TAGs, 
and described how to use TAGs to find consensus trees and to determine whether 
a collection of phylogenetic trees is compatible. 

Extending TAGs to include potentially thousands of input trees from across 
the tree of life leads to several future challenges; two major ones are incorporat¬ 
ing trees at different taxonomic levels and finding ways to synthesize conflicting 
phylogenetic input trees. It may be possible to address the second of these chal¬ 
lenges using ideas from the AncestralBuild algorithm |5I8) . Dealing with 
conflict among the input trees is also essential for processing large-scale phylo¬ 
genetic data sets. Although a visual inspection of the TAG provides some insight 
into the areas of conflict, approaches to quantify phylogenetic conflict within the 
TAG may provide valuable insight into mechanisms causing phylogenetic incon¬ 
gruence among biological datasets and help guide future phylogenetic research. 
The synthesis approach of Smith et al. [50] relies on a subjective ranking of the 
input trees. Potentially, a MinCutSupertree approach, like the MultiLevel- 
SuPERTREE algorithm [3], could be applied to a TAG to provide an efficient and 
effective approach for synthesizing a tree of life. 
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